Thesis - Vasileios Porpodas
نویسنده
چکیده
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitioning the shared hardware resources (e.g., register file). Such designs further increase the challenges of instruction scheduling since the compiler has the additional tasks of deciding on the placement of the instructions to the corresponding clusters and orchestrating the data movements across clusters. In this thesis we propose instruction scheduling optimizations for energy-efficient VLIW processors. Some of the techniques aim at improving the existing state-of-theart scheduling techniques, while others aim at using compiler techniques for closing the gap between lightweight hardware designs and more complex ones. Each of the proposed techniques target individual features of energy efficient VLIW architectures. Our first technique, called Aligned Scheduling, makes use of a novel scheduling heuristic for hiding memory latencies in lightweight VLIW processors without hardware load-use interlocks (Stall-On-Miss). With Aligned Scheduling, a software-only technique, a SOM processor coupled with non-blocking caches can better cope with the cache latencies and it can perform closer to the heavyweight designs. Performance is improved by up to 20% across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The rest of the techniques target a class of VLIW processors known as clustered VLIWs, that are more scalable and more energy efficient and operate at higher frequencies than their monolithic counterparts. The second scheme (LUCAS) is an improved scheduler for clustered VLIW processors that solves the problem of the existing state-of-the-art schedulers being very susceptible to the inter-cluster communication latency. The proposed unified clustering and scheduling technique is a hybrid scheme that performs instruction by instruction switching between the two state-of-the-art clustering heuristics, leading to better scheduling than either of them. It generates better performing code compared to the state-of-the-art for a wide range of inter-cluster latency values on the Mediabench II benchmarks. The third technique (called CAeSaR) is a scheduler for clustered VLIW architectures that minimizes inter-cluster communication by local caching and reuse of already
منابع مشابه
DRIFT: Decoupled CompileR-Based Instruction-Level Fault-Tolerance
Compiler-based error detection methodologies replicate the instructions of the program and insert checks wherever it is needed. The checks evaluate code correctness and decide whether or not an error has occurred. The replicated instructions and the checks cause a large slowdown. In this work, we focus on reducing the error detection overhead and improving the system’s performance without degra...
متن کاملAligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors
The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations...
متن کاملIon Channel Modulation by Photocaged Dioctanoyl PIP2
ION CHANNEL MODULATION BY PHOTOCAGED DIOCTANOYL PIP2 by Junghoon Ha, BSc. A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University. Virginia Commonwealth University, 2009 Thesis Director: Diomedes E. Logothetis, Ph.D. Chair, Department of Physiology and Biophysics Phosphatidylinositol bisphosphate (PIP2) directly regul...
متن کاملMobile Agents: Application to services in mobile computing and architectural improvements
Mobile and more generally distributed computing are two basic research areas of Information Technology since they find many applications in our daily life, mainly due to the rapidly increasing growth of technologies of mobile networks and the Internet. The penetration of these technologies in our contemporary life has by far changed the way we work, communicate, seek information and organize ou...
متن کاملEmbedded Testing Architectures
Tenentes, Vasileios, PhD Department of Computer Science & Engineering, University of Ioannina, Greece.
متن کامل